A Hybrid Approach to Compiling Bilingual Dictionaries of Medical Terms from Parallel Corpora
نویسندگان
چکیده
Existing bilingual dictionaries of technical terms suffer from limited coverage and are only available for a small number of language pairs. In response to these problems, we present a method for automatically constructing and updating bilingual dictionaries of medical terms by exploiting parallel corpora. We focus on the extraction of multiword terms, which constitute a challenging problem for term alignment algorithms. We apply our method to two low resourced language pairs, namely English-Greek and English-Romanian, for which such resources did not previously exist in the medical domain. Our approach combines two term alignment models to improve the accuracy of the extracted medical term translations. Evaluation results show that the precision of our method is 86% and 81% for English-Greek and English-Romanian respectively, considering only the highest ranked candidate translation.
منابع مشابه
Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries
Ebrahim Ansari ([email protected]) et al. 2017. Extracting bilingual per-sian italian lexicon from comparable corpora using different types of seed dictionaries. In " Applications of Comparable Corpora " edited book Berlin Linguistic Press (ed.). Bilingual dictionaries are very important in various fields of natural language processing. In recent years, research on extracting new bilingual lex...
متن کاملUtilizing Contextually Relevant Terms in Bilingual Lexicon Extraction
This paper demonstrates one efficient technique in extracting bilingual word pairs from non-parallel but comparable corpora. Instead of using the common approach of taking high frequency words to build up the initial bilingual lexicon, we show contextually relevant terms that co-occur with cognate pairs can be efficiently utilized to build a bilingual dictionary. The result shows that our model...
متن کاملBuilding Bilingual Dictionaries from Parallel Web Documents
In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a ...
متن کاملCompiling French-Japanese Terminologies from the Web
We propose a method for compiling bilingual terminologies of multi-word terms (MWTs) for given translation pairs of seed terms. Traditional methods for bilingual terminology compilation exploit parallel texts, while the more recent ones have focused on comparable corpora. We use bilingual corpora collected from the web and tailor made for the seed terms. For each language, we extract from the c...
متن کاملCombining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora
Automatically compiling bilingual dictionaries of technical terms from comparable corpora is a challenging problem, yet with many potential applications. In this paper, we exploit two independent observations about term translations: (a) terms are often formed by corresponding sub-lexical units across languages and (b) a term and its translation tend to appear in similar lexical context. Based ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014